Domain Adaptation: Overfitting and Small Sample Statistics

نویسندگان

Dean P. Foster

Sham M. Kakade

Ruslan Salakhutdinov

چکیده

We study the prevalent problem when a test distribution differs from the training distribution. We consider a setting where our training set consists of a small number of sample domains, but where we have many samples in each domain. Our goal is to generalize to a new domain. For example, we may want to learn a similarity function using only certain classes of objects, but we desire that this similarity function be applicable to object classes not present in our training sample (e.g. we might seek to learn that “dogs are similar to dogs” even though images of dogs were absent from our training set). Our theoretical analysis shows that we can select many more features than domains while avoiding overfitting by utilizing data-dependent variance properties. We present a greedy feature selection algorithm based on using T -statistics. Our experiments validate this theory showing that our T -statistic based greedy feature selection is more robust at avoiding overfitting than the classical greedy procedure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Domain Adaptation: A Small Sample Statistical Approach

متن کامل

Regularization techniques for fine-tuning in neural machine translation

We investigate techniques for supervised domain adaptation for neural machine translation where an existing model trained on a large out-of-domain dataset is adapted to a small in-domain dataset. In this scenario, overfitting is a major challenge. We investigate a number of techniques to reduce overfitting and improve transfer learning, including regularization techniques such as dropout and L2...

متن کامل

Domain adaptation for Alzheimer's disease diagnostics

With the increasing prevalence of Alzheimer's disease, research focuses on the early computer-aided diagnosis of dementia with the goal to understand the disease process, determine risk and preserving factors, and explore preventive therapies. By now, large amounts of data from multi-site studies have been made available for developing, training, and evaluating automated classifiers. Yet, their...

متن کامل

Sample-oriented Domain Adaptation for Image Classification

Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. The conventional image processing algorithms cannot perform well in scenarios where the training images (source domain) that are used to learn the model have a different distribution with test images (target domain). Also, many real world applicat...

متن کامل

A Domain Adaptation Regularization for Denoising Autoencoders

Finding domain invariant features is critical for successful domain adaptation and transfer learning. However, in the case of unsupervised adaptation, there is a significant risk of overfitting on source training data. Recently, a regularization for domain adaptation was proposed for deep models by (Ganin and Lempitsky, 2015). We build on their work by suggesting a more appropriate regularizati...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1105.0857 شماره

صفحات -

تاریخ انتشار 2011

Domain Adaptation: Overfitting and Small Sample Statistics

نویسندگان

چکیده

منابع مشابه

Domain Adaptation: A Small Sample Statistical Approach

Regularization techniques for fine-tuning in neural machine translation

Domain adaptation for Alzheimer's disease diagnostics

Sample-oriented Domain Adaptation for Image Classification

A Domain Adaptation Regularization for Denoising Autoencoders

عنوان ژورنال:

اشتراک گذاری